Contributing authors:

Data analysis, code, and maintenance of this notebook: Carmen Lia Murall, Raphaël Poujol, Susanne Kraemer, Arnaud N’Guessan, Sarah Otto, Art Poon, Jesse Shapiro, Fiona Brinkman, Justin Jia, Zohaib Anwar and Erin Gill. Input and direction by other members of Pillar 6 (https://covarrnet.ca/our-team/#pillar-6 ), which include: Caroline Colijn, Jorg Fritz, Morgan Langille, Paul Gordon, Julie Hussin, Jeff Joy, and William Hsiao.

Sequence collection, generation, release, and feedback on analyses: Canadian laboratories as part of the CPHLN and CanCOGeN are making these data publicly available and contribute feedback on analyses presented here. A complete list of lab authors is in this repository, and more details are below in the Acknowledgement section.

Introduction

This notebook is built to explore Canadian SARS-CoV-2 genomic and epidemiological data with the aim of investigating viral evolution and spread. It is for discussion with pillar 6’s team and for sharing with collaborators, e.g. public health labs. These analyses can spur further research within or across pillars, be used for reports (or data dashboards), support discussions with the science communication pillar for public dissemination, and enable code reuse by public health authorities/laboratories for their internal use.

Canadian genomic and epidemiological data will be regularly pulled from various public sources (see list below) to keep these analyses up-to-date. Only representations of aggregate data will be posted here.

Important caveats and disclaimers:
These analyses represent only a snapshot of SARS-CoV-2 evolution in Canada. Only some infections are detected by PCR testing, only some of those are sent for whole-genome sequencing, and not all sequences are posted to public facing reposittories. Sequencing volumes and priorities have changed during the pandemic, and the sequencing strategy is typically a combination of prioritizing outbreaks, travellers, public health investigations, and random sampling for genomic surveillance.

For example, specific variants or populations might be preferentially sequenced at certain times in certain jurisdictions. When possible, these differences in sampling strategies are mentioned but they are not always known. With the arrival of the Omicron wave, many jurisdictions across Canada reached testing and sequencing capacity mid-late December 2021 and thus switched to targeted testing of priority groups (e.g., hospitalized patients, health care workers, and people in high-risk settings). Therefore, from this time onward, case counts are likely underestimated and the sequenced virus diversity is not necessarily representative of the virus circulating in the overall population.

Thus, interpretation of these plots and comparisons between health regions should be made with caution, considering that the data may not be fully representative. These analyses are subject to frequent change given new data and updated lineage designations.

## 1. LOAD processed metadata of Canadian sequences (with latest pangolin, division, and full seq IDs)
#Download metadata from VirusSeq, put the date here:

# this can be made more compact for faster loading
meta <- read.csv(gzfile("data_needed/virusseq.metadata.csv.gz"), sep="\t")
meta$province <- meta$geo_loc_name_state_province_territory

# Select only the column we want to use later
columnlist=c("fasta_header_name", "province", "host_gender", "host_age_bin",
             "sample_collection_date", "sample_collected_by", 
             "purpose_of_sampling", "purpose_of_sequencing","lineage", 
             "raw_lineage")
meta <- meta[ , columnlist]


### metadata cleaning 
unknown.str <- c("Undeclared", "Not Provided", "Restricted Access", "Missing", 
                 "Not Applicable")
meta <- as.data.frame(apply(meta, 2, function(x) {
  x[is.element(x, unknown.str)] <- "Unknown"
  x
}))

meta$sample_collection_date <- as.Date(meta$sample_collection_date)
meta$week <- cut(meta$sample_collection_date, 'week')

source("scripts/scanlineages.R")
meta$pango_group <- create.pango.group(VOCVOI, meta)
meta$pango_group <- as.factor(meta$pango_group)
meta[meta == ""] <- "Unknown"
## 2. LOAD epidemiological data (PHAC)


#from: https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html?stat=num&measure=total&map=pt#a2
epidataCANall <- read.csv(url("https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv"))
epidataCANall$date <- as.Date(epidataCANall$date)
epidataCANall$prname <- gsub('_', ' ', epidataCANall$prname)
epidate <- tail(epidataCANall,1)$date #download date

epidataCANall$previousvalue <- 0
#small loop to get the numtoday column from previous versions of this file from the cumulative cases
for(row in 1:nrow(epidataCANall)) {
  p <- epidataCANall[row, "prname"]
  subdf <- epidataCANall[which( 
    (epidataCANall$date > epidataCANall[row, "date"] & epidataCANall$prname==p) 
    ), ]
  if(nrow(subdf) != 0) {
    nextrow <- which( (epidataCANall$date == min(subdf$date) & epidataCANall$prname==p))
    epidataCANall[nextrow, "previousvalue"] <- epidataCANall[row, "totalcases"]
  }
}

epidataCANall$numtoday <- epidataCANall$totalcases - epidataCANall$previousvalue

SARS-CoV-2 in Canada

Current Situation

BQ.1, BQ.1.1 and associated sublineages continue to show the strongest growth, with other newer sublineages as noted below also of potential interest. BA.2.75 and XBB variants, which were previously more associated with the eastern hemisphere of the globe, are starting to increase more notably in some regions in the western hemisphere, including some regions of Canada.

Variants of current interest,, due to their potential growth advantage, or mutations of potential functional significance (note: sublineages of many of these are also of interest). See also the evolving plot below of “Fastest growing lineages in Canada”:

  • BA.2.75
  • BA.5.2.13
  • BA.5.2.24
  • BA.5.2.25
  • BE.1.2.1
  • BF.31.1
  • BF.7
  • BN.1
  • BQ.1
  • BQ.1.1
  • BW.1
  • CA.1
  • CH.1.1
  • CK.2.1.1
  • CR.1
  • CZ.1 and other BQ.1.1 descendants
  • XBB
  • XBB.1
  • and other variants with immune evasion mutations (e.g: S:R346T), particularly those variants with sets of 7 or more of such mutations of interest.

Omicron sublineages in Canada

Here we take a look, sub-dividing the major sub-lineages currently circulating in Canada.

BA.1

Sublineage BA.1

The grey zone represent rare lineages (not in top 15) : BA.1.1.3(1), BA.1.13.1(1), BA.1.15.3(1), BA.1.17.1(1), BD.1(1), BA.1.12(2), BA.1.4(2), BA.1.8(2), BA.1.1.11(4), BA.1.1.4(4), BA.1.1.9(4), BA.1.10(4), BA.1.14.1(4), BA.1.14.2(4), BA.1.7(4), BA.1.1.17(5), BA.1.1.8(9), BA.1.16(10), BA.1.5(13), BA.1.1.13(15), BA.1.1.7(16), BC.2(18), BA.1.15.2(27), BA.1.1.15(46), BA.1.21(55), BA.1.19(100), BA.1.9(127), BA.1.1.1(164), BA.1.13(164), BA.1.15.1(178), BA.1.1.2(281)

early BA.2

Sublineage BA.2

The grey zone represent rare lineages (not in top 15) : BA.2.10.3(1), BA.2.16(1), BA.2.25(1), BA.2.3.8(1), BA.2.38.3(1), BA.2.58(1), BA.2.59(1), BA.2.63(1), BA.2.73(1), BA.2.75.3(1), BM.1(1), BM.4.1(1), BA.2.14(2), BA.2.15(2), BA.2.2.1(2), BA.2.3.9(2), BA.2.30(2), BA.2.4(2), BA.2.42(2), BA.2.71(2), BA.2.9.6(2), BH.1(2), BA.2.11(3), BA.2.12.2(3), BA.2.27(3), BA.2.3.14(3), BA.2.38.2(3), BA.2.45(3), BA.2.53(3), BA.2.54(3), BA.2.55(3), BA.2.62(3), BA.2.68(3), BA.2.70(3), BA.2.9.4(3), BA.2.3.12(4), BA.2.50(4), BA.2.3.15(5), BA.2.34(5), BA.2.75.1(5), BA.2.79(5), BA.2.41(6), BA.2.44(6), BA.2.47(6), BA.2.23.1(7), BA.2.3.13(7), BA.2.35(7), BA.2.64(7), BA.2.81(7), BG.6(7), BA.2.43(8), BA.2.60(8), BA.2.3.5(9), BA.2.3.7(9), BA.2.61(9), BA.2.24(10), BA.2.31.1(10), BA.2.38.1(10), BA.2.9.2(10), BA.2.22(11), BA.2.52(11), BA.2.72(11), BA.2.9.1(11), BA.2.26(12), BA.2.6(12), BA.2.29(13), BA.2.49(13), BA.2.5(14), BA.2.8(14), BG.4(14), BA.2.51(15), BA.2.78(15), BA.2.3.1(16), BA.2.40.1(17), BA.2.13.1(18), BA.2.17(19), BA.2.32(21), BA.2.7(21), BA.2.9.5(21), BA.2.2(24), BA.2.9.3(26), BA.2.48(27), BA.2.74(29), BG.5(29), BA.2.31(36), BA.2.56(39), BA.2.13(48), BA.2.3.17(56), BA.2.75(58), BA.2.36(66), BA.2.3.2(75), BA.2.23(86), BA.2.76(87), BA.2.1(88), BA.2.37(99), BA.2.10.1(104), BG.2(110), BA.2.3.11(123), BA.2.18(128)

late BA.2

Sublineage BA.2

The grey zone represent rare lineages (not in top 15) : BA.2.1(1), BA.2.13.1(1), BA.2.21(1), BA.2.38.2(1), BA.2.40.1(1), BA.2.47(1), BA.2.56(1), BA.2.75.9(1), BA.2.82(1), BA.2.9(1), BG.5(1), BJ.1(1), BL.1.3(1), BM.2(1), BN.2(1), CB.1(1), BA.2.38(2), BA.2.75.6(2), CA.1(2), BA.2.18(3), BA.2.3(3), BA.2.65(3), BL.4(3), BH.1(4), BM.4.1.1(4), BA.2.74(5), BM.1(5), BM.1.1.1(5), BG.2(6), BL.3(6), BN.2.1(6), BP.1(6), BR.3(7), BL.2(10)

BA.4

Sublineage BA.4

The grey zone represent rare lineages (not in top 15) : BA.4.8(1), BA.4.1.10(2), BA.4.1.4(2), BA.4.3(2), BA.4.5(2)

BA.5.1

Sublineage BA.5.1

The grey zone represent rare lineages (not in top 15) : BA.5.1.26(1), BA.5.1.16(2), BA.5.1.9(3), BA.5.1.28(4), BA.5.1.19(8), BA.5.1.8(8), BA.5.1.21(9), BT.1(9), BA.5.1.27(11), BA.5.1.4(14), BA.5.1.17(21), BA.5.1.20(33), BA.5.1.15(40)

BA.5.2

Sublineage BA.5.2

The grey zone represent rare lineages (not in top 15) : BF.23(1), BF.24(1), BF.3.1(1), BV.2(1), CD.1(1), BF.18(2), BF.19(2), BF.22(2), BU.1(3), CG.1(3), BA.5.2.7(6), BA.5.2.12(7), BA.5.2.32(10), BF.1.1(10), CE.1(11), BF.25(14), BA.5.2.4(16), BF.15(17), BF.3(19), BF.6(23), BF.20(24), BA.5.2.19(25), BA.5.2.31(26), BA.5.2.23(28), BA.5.2.18(34), BF.14(35), BF.12(36), BF.2(37), BF.16(39), BA.5.2.14(43), BA.5.2.16(45), BA.5.2.24(54), BA.5.2.26(61), BA.5.2.28(76), BF.13(96), BA.5.2.25(100), BA.5.2.27(121), BA.5.2.13(130), BF.4(182), BF.11(190), BF.8(191), BA.5.2.2(195), BA.5.2.6(197), BA.5.2.8(213), BF.21(216), BA.5.2.3(226)

Other BA.5

Sublineage BA.5 without BA.5.1 and BA.5.2

The grey zone represent rare lineages (not in top 15) : BA.5.3.4(2), BQ.1.4(2), BE.2(7), BA.5.3.3(11), BA.5.5.3(11), BW.1(15), BA.5.6.1(16), BA.5.6.2(22), BA.5.3(23), BE.4.1(23), BA.5.5.1(25), CC.1(29), BA.5.10.1(35), BQ.1.3(35), BA.5.3.2(36), BE.1.2(41), BA.5.9(51), BA.5.5.2(60), BE.1.1.2(62)

Recombinant and divergent lineages

Recombinant and divergent lineages

The grey zone represent rare lineages (not in top 15) : XAF(2), XAP(2), XE(2), XZ(2)

Fastest growing lineages in Canada

Here we show the selection estimates and their 95% confidence intervals for pangolin lineages with more than 10 sequences in Canada since 2022-08-01 and with enough data to estimate the confidence interval. Each selection estimate measures the growth rate relative to BA.5.2 stricto. Plots showing the change in variant frequency over time in Canada are given below for lineages with more than 50 sequences.

Plot (stricto)

Plot single lineages

Background shading illustrates sub-variants that we need to keep an eye on in blue (s ~ 0-5%, corresponding to doubling times of more than two weeks), those whose rapid growth is likely to displace currently common strains in pink (s ~ 5-10% with one to two week doubling times), and those with a growth advantage characteristic of a variant of concern in orange (s > 10%+ with less than one week doubling times).

Estimating selection of sub-variants with low sequence counts (10-100 dots) is prone to error, such as mistaking one-time super spreader events or pulses of sequence data from one regions as selection. These estimates should be considered as very preliminary.

source("scripts/plot_growing_lineages.R")
#paramselected=allparamstricto[sapply(allparamstricto,function(x){x$region=="Canada" & (x$fit)$fit[["s1"]]>0 & sum(x$toplot$n2)>10})]

paramselected=allparams[sapply(allparams,function(p){if(!any(is.na(p$fit))){
  x=p$mut[[1]];
  p$region=="Canada"&
  (p$fit)$fit[["s1"]]>0 & 
  sum(p$toplot$n2)>10&
  substr(x, nchar(x), nchar(x))!="*"
  }else{FALSE}})]
  
a=plot_growing_lineage(paramselected[1:min(25,length(paramselected))])

Plot (non stricto)

Plot monophyletic groups of lineages

This plot highlights the groups of related lineages that are growing fastest (e.g., BE.1.1.1* is the monophyletic clade that includes BE.1.1.1 and BQ.1*).

Again, background shading illustrates sub-variants that we need to keep an eye on in blue (s ~ 0-5%, corresponding to doubling times of more than two weeks), those whose rapid growth is likely to displace currently common strains in pink (s ~ 5-10% with one to two week doubling times), and those with a growth advantage characteristic of a variant of concern in orange (s > 10%+ with less than one week doubling times).

Estimating selection of sub-variants with low sequence counts (10-100 dots) is prone to error, such as mistaking one-time super spreader events or pulses of sequence data from one regions as selection. These estimates should be considered as very preliminary.

paramselected=allparams[sapply(allparams,function(p){if(!any(is.na(p$fit))){
  x=p$mut[[1]];
  p$region=="Canada"&
  (p$fit)$fit[["s1"]]>0 & 
  sum(p$toplot$n2)>10&
  substr(x, nchar(x), nchar(x))=="*"
  }else{FALSE}})]
  
a=plot_growing_lineage(paramselected[1:min(25,length(paramselected))])

Table

Table

paramselected=allparams[sapply(allparams,function(p){if(!any(is.na(p$fit))){
  x=p$mut[[1]];
  p$region=="Canada"&
  (p$fit)$fit[["s1"]]>0 & 
  sum(p$toplot$n2)>10
  }else{FALSE}})]
DT::datatable(plot_growing_lineage(paramselected,makeplot=FALSE))

Mutational composition of Omicron

Tabulation of the most predominant mutational changes in Omicron, with adjacent rows comparing the composition of Canadian sublineages to that sublineage globally.

Mutational profile of Omicron and its sublineages in Canada and globally for the most prevalent (>75%) point mutations in each category (based on all genomes available on VirusSeq on 2022-12-08).

Selection on Omicron

Here we examine the relative rate of spread of the different sublineages of Omicron currently in Canada. Specifically, we determine if a new or emerging lineage has a selective advantage, s (and by how much), against a reference lineage previously common in Canada (see the methods for more details about selection and how it is estimated).

Currently, the major group of Omicron lineages rising in frequency are the BQ.* group, which descend from BA.5*. We first show this growth of BQ.* and other BA.5* (here excluding BQ.*), relative to the remaining strains, which consist predominantly of BA.2* and BA.4*. Left plot: y-axis is the proportion of sub-lineages BA.4 and BA.5 among Omicron; right plot: y-axis describes the logit function, log(freq(BQ.* or BA.5*)/freq(other)), which gives a straight line whose slope is the selection coefficient if selection is constant over time (see methods).

For comparison, Alpha had a selective advantage of s ~ 6%-11% per day over preexisting SARS-CoV-2 lineages, and Delta had a selective advantage of about 10% per day over Alpha.

Caveat: These selection analyses must be interpreted with caution due to the potential for non-representative sampling, lags in reporting, and spatial heterogeneity in prevalence of different sublineages across Canada. Provinces that do not have at least 20 sequences of BQ.*, BA.5*, and other lineages during this time frame are not displayed.

Caveat: Separate analyses are provided for those provinces with at least 20 BA.2 sequences in the database. These selection analyses must be interpreted with caution due to the potential for non-representative sampling, lags in reporting, and spatial heterogeneity in prevalence of different sublineages across Canada.

startpar3 <- list(p=c(0.8, 0.1), s=c(0.1, 0.1))

setAll=getAllStrictoLineages(meta)
sublineages_BA5 <- getStrictoSubLineages("BA.5*",meta)
sublineages_BQ <- getStrictoSubLineages("BQ*",meta)

setAll=setAll[!setAll %in% sublineages_BA5]
sublineages_BA5=sublineages_BA5[!sublineages_BA5 %in% sublineages_BQ]

#color for each lineage
col <- c(pal["Omicron BA.5"], pal["Omicron BQ"])

source("scripts/plot_selection_estimator.R")

#Set a starting date
#Note that the startdate shouldn't be too much before both alleles become common
#or rare migration events that die off could throw off the estimation procedure 
#(so that the parameter estimates account for the presence of those alleles long in the past).
startdate<-as.Date(VirusSeq_release)-days(75) #Using a later date with less sampling noise

sub.plot.selection.estimate <- function(region,maxdate=NA){
  maxdate=plot.selection.estimate(region=region,
                                        startdate=startdate, startpar=startpar3,
                                        reference=c(setAll),
                                        mutants=list(sublineages_BA5,sublineages_BQ),
                                        names=list("BA.5*","BQ*","the rest"),
                                        maxdate=maxdate, col=col)
  return(maxdate)
}



#####################################################
# tabs for displaying in notebook
#each PT tab should have curve plot and breakpoint plot side by side

Canada

Canada

BC

British Columbia

## [1] "Not enough data available"

AL

Alberta

## [1] "Not enough data available"

SK

Saskatchewan

## [1] "Not enough data available"

MB

Manitoba

ON

Ontario

## [1] "Not enough data available"

QC

Quebec

East

East provinces

BA.2 sublineage selection

Here we show the trends of the various BA.2.* sublineages over time, relative to the frequency of BA.5.2 by itself (shown for sublineages with at least 50 (Canada) or 20 (provinces) cases). Proportions shown here are only among BA.5.2 (stricto) and the lineage illustrated. Note that these plots are not necessarily representative of trends in each province and that mixing of data from different provinces may lead to shifts in frequency that are not due to selection.

source("scripts/scanlineages.R")
plotsubvariant <- function(region,min,parentalnode){
  #subnodes=getAllSubLineages(parentalnode,fullpangotree)
  subnodes=getStrictoSubLineages(parentalnode,meta)
  nplot=0
  for(p in allparams) {
    if(!any(is.na(p$fit)) && p$region==region ){
      if((p$mut[[1]] %in% subnodes) && (p$fit)$fit[["s1"]]>0 && sum(p$toplot$n2)>min){
          plot.selection(plotparam=p)
          nplot=nplot+1
      }
    }
  }
  if(nplot==0){
    print("Not enough data available")
  }
}


plotba2 <- function(region,min){
  plotsubvariant(region,min,"BA.2*")
}


plotba4 <- function(region,min){
  plotsubvariant(region,min,"BA.4*")
}


plotba5 <- function(region,min){
  plotsubvariant(region,min,"BA.5*")
}

Canada

Canada (n>50)

BC

British Columbia (n>20)

[1] “Not enough data available”

AL

Alberta (n>20)

SK

Saskatchawan (n>20)

[1] “Not enough data available”

MB

Manitoba (n>20)

[1] “Not enough data available”

ON

Ontario (n>20)

QC

Quebec (n>20)

East

Eastern provinces (n>20)

[1] “Not enough data available”

BA.4 sublineage selection

Here we show the trends of the various BA.4# sublineages over time, relative to the frequency of BA.5.2 by itself (shown for sublineages with at least 50 (Canada) or 20 (provinces) cases). Proportions shown here are only among BA.5.2 (stricto) and the lineage illustrated. Note that these plots are not necessarily representative of trends in each province and that mixing of data from different provinces may lead to shifts in frequency that are not due to selection.

Canada

Canada (n>50)

BC

British Columbia (n>20)

[1] “Not enough data available”

AL

Alberta (n>20)

[1] “Not enough data available”

SK

Saskatchawan (n>20)

[1] “Not enough data available”

MB

Manitoba (n>20)

ON

Ontario (n>20)

QC

Quebec (n>20)

East

Eastern provinces (n>20)

[1] “Not enough data available”

BA.5 sublineage selection

Here we show the trends of the various BA.5# sublineages over time, relative to the frequency of BA.5.2 by itself (shown for sublineages with at least 50 (Canada) or 20 (provinces) cases). Proportions shown here are only among BA.5.2 (stricto) and the lineage illustrated. Note that these plots are not necessarily representative of trends in each province and that mixing of data from different provinces may lead to shifts in frequency that are not due to selection.

Canada

Canada (n>50)

BC

British Columbia (n>20)

[1] “Not enough data available”

AL

Alberta (n>20)

SK

Saskatchawan (n>20)

[1] “Not enough data available”

MB

Manitoba (n>20)

ON

Ontario (n>20)

QC

Quebec (n>20)

East

Eastern provinces (n>20)

Variants in Canada over time

These plots show the changing composition of sequences for all Canadian data posted to the VirusSeq Portal according to PANGO lineage/variant designation (Pango version 4.1.3 (Viral AI)), up to 2022-12-03. Because sampling and sequencing procedures vary by region and time, this does not necessarily reflect the true composition of SARS-CoV-2 viruses in Canada over time.

# focus on emergence of VoCs in Canada

meta1 <- meta[as.Date(meta$week) > Variants_Canada_over_time_Start_Date, ]
meta1$week <- as.factor(as.character(meta1$week))

dat <- lapply(unique(meta1$province), function(x) {
  as.data.frame.matrix(table(meta1$week[meta1$province==x],
                             meta1$pango_group[meta1$province==x]))
})
names(dat) <- unique(meta1$province)

dat[['Canada']] <- as.data.frame.matrix(table(meta1$week, meta1$pango_group))

# pass colour legend to JavaScript layer
dat[['legend']] <- as.list(VOCVOI$color)
names(dat[['legend']]) <- VOCVOI$name
dat[['legend']]$other <- 'grey'

r2d3(data=toJSON(dat), script="js/barplot.js", container="div",
     elementId="barplot-element")
rm(dat)
rm(meta1)

From the beginning of the pandemic to the fall of 2021, Canadian sequences were mostly of the wildtype lineages (pre-VOCs). By the beginning of summer 2021, the VOCs Alpha and Gamma were the most sequenced lineages overall in Canada. The Delta wave grew during the summer of 2021 with sublineages AY.25 and AY.27 constituting sizeable proportions of this wave. Omicron arrived in November of 2021 and spread in three main waves, first BA.1* (early 2022), then BA.2* (spring 2022), then BA.5* (summer 2022). Current, multiple sublineages of Omicron persist, with emerging sublineages spreading, such as BQ.1.1 (a BA.5 sub-lineage). See below for jurisdictional differences of these plots.

There are two PANGO lineages that have a Canadian origin and that predominately spread within Canada (with some exportations internationally): B.1.438.1 and B.1.1.176. Other lineages of historical interest in Canada:

Canadian trees

Here we present a subsampled phylogenetic snapshot of SARS-CoV-2 genomes from Canada.

### metadata and trees
source("scripts/tree.r")

# load trees from files
mltree <- read.tree("./data_needed/sample1.rtt.nwk")
ttree <- read.tree("./data_needed/sample1.timetree.nwk")
stopifnot(all(sort(mltree$tip.label) == sort(ttree$tip.label)))
dateseq <- seq(ymd('2019-12-01'), ymd('2022-11-01'), by='3 month')

# tips are labeled with [fasta name]_[lineage]_[coldate]
# extracting just the first part makes it easier to link to metadata
mltree$tip.label <- reduce.tipnames(mltree$tip.label)
ttree$tip.label <- reduce.tipnames(ttree$tip.label)

fieldnames<- c("fasta_header_name", "province", "host_gender", "host_age_bin",
               "sample_collected_by", "purpose_of_sampling",
               "lineage", "pango_group","week")

# extract rows from metadata table that correspond to this tree
metasub1 <- meta[meta$fasta_header_name%in% ttree$tip.label, fieldnames]
# sort rows to match tip labels in tree
metasub1 <- metasub1[match(ttree$tip.label, metasub1$fasta_header_name), ]
metasub_omi <- metasub1[grepl("Omicron",metasub1$pango_group ), ]
#scale to number of mutations
mltree$edge.length <- mltree$edge.length*29903

###Time Tree
ttree2 <- ttree
ttree2$edge.length[ttree2$edge.length == 0] <- 1e-4



hab=unique(meta$host_age_bin)
hab=hab[order(hab)]
presetColors=data.frame(name=c("other",VOCVOI$name,hab), color=c("#777777",VOCVOI$color,rev(hcl.colors(length(hab)-1, "Red-Blue")),"#777777"))

#suppressWarnings({
#  res <- ace(metasub1$pango.group, ttree2, type="discrete", model="ER")
#})
#idx <- apply(res$lik.anc, 1, which.max)[2:nrow(res$lik.anc)]  # exclude root edge
#anc <- levels(as.factor(metasub1$pango.group))[idx]
timeTreeJsonObj <- DrawTree(ttree2, metasub1, "timetree", presetColors, fieldnames)

#diversity ML tree
diversityTreeJsonObj <- DrawTree(mltree, metasub1, "mltree", presetColors, fieldnames)

### omicron diversity tree
MLtree_omi<-keep.tip(mltree, metasub_omi$fasta_header_name)


OmicrondiversityTreeJsonObj <- DrawTree(MLtree_omi, metasub_omi, "omimltree", presetColors, fieldnames)

Time Tree